home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98a.txt
/
000144_icon-group-sender _Tue Mar 17 16:26:20 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
2KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id QAA08544
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Tue, 17 Mar 1998 16:26:20 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA17517; Tue, 17 Mar 1998 16:26:19 -0700
Message-Id: <199803172152.OAA16672@orpheus.gemini.edu>
From: swampler@noao.edu (Steve Wampler)
Date: Tue, 17 Mar 1998 14:52:34 MST
In-Reply-To: Mark Evans's mail message of Mar 13, 4:06pm.
X-Mailer: Mail User's Shell (7.2.3 5/22/91)
To: icon-group@optima.CS.Arizona.EDU
Subject: Re: Letter Probabilities
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Content-Length: 1502
On Mar 17 at 10:16am, Mark Evans writes:
} I think that we have here one of those classic execution-time vs.
} memory-space tradeoffs that they teach about in computer science
} classes.
}
} Mark
}
} Paul Abrahams wrote:
} >
} > Using the probabilities to construct a weighted string and then
} > selecting a random character from the string does have one unaesthetic
} > property, in my book: the length of the string grows exponentially with
} > the precision of the probabilities. For that reason I'd opt for a
} > binary search, which takes 5 probes no matter what the precision.
} >
} > I wonder if there's another way of attacking this problem.
I'm not sure I understand this. The upper bound on the weighted string
is the length of the input text itself. In fact, the input text
itself is the *perfect* weighted string! There is no need to reduce
the string to probabilities and then expand the probabilities back
into a different string - all this does is (a) slow things down and
(b) reduce the 'accuracy' of the probabilities. ((b) isn't all that
important when producing 'random' text, I suppose.)
Now, for *really, really* large input samples, this becomes
inefficent (memory-wise). It's certainly not an issue for the
20K character lengths I've seen mentioned here. I regularly
process 10-MB strings with Icon.
--
Steve Wampler - swampler@gemini.edu [Gemini 8m Telescopes Project (under AURA)]
The gods that smiled at your birth are now laughing openly. (Fortune Cookie)